Inter-layer reference picture processing for coding standard scalability

ABSTRACT

Video data are coded in a coding-standard layered bit stream. Given a base layer (BL) and one or more enhancement layer (EL) signals, the BL signal is coded into a coded BL stream using a BL encoder which is compliant to a first coding standard. In response to the BL signal and the EL signal, a reference processing unit (RPU) determines RPU processing parameters. In response to the RPU processing parameters and the BL signal, the RPU generates an inter-layer reference signal. Using an EL encoder which is compliant to a second coding standard, the EL signal is coded into a coded EL stream, where the encoding of the EL signal is based at least in part on the inter-layer reference signal. Receivers with an RPU and video decoders compliant to both the first and the second coding standards may decode both the BL and the EL coded streams.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Provisional Patent ApplicationNo. 61/706,480 filed 27 Sep. 2012, which is hereby incorporated byreference in its entirety.

TECHNOLOGY

The present invention relates generally to images. More particularly, anembodiment of the present invention relates to inter-layer referencepicture processing for coding-standard scalability.

BACKGROUND

Audio and video compression is a key component in the development,storage, distribution, and consumption of multimedia content. The choiceof a compression method involves tradeoffs among coding efficiency,coding complexity, and delay. As the ratio of processing power overcomputing cost increases, it allows for the development of more complexcompression techniques that allow for more efficient compression. As anexample, in video compression, the Motion Pictures Expert Group (MPEG)from the International Standards Organization (ISO) has continuedimproving upon the original MPEG-1 video standard by releasing theMPEG-2, MPEG-4 (part 2), and H.264/AVC (or MPEG-4, part 10) codingstandards.

Despite the compression efficiency and success of H.264, a newgeneration of video compression technology, known as High EfficiencyVideo Coding (HEVC), in now under development. HEVC, for which a draftis available in “High efficiency video coding (HEVC) text specificationdraft 8,” ITU-T/ISO/IEC Joint Collaborative Team on Video Coding(JCT-VC) document JCTVC-J1003, July 2012, by B. Bross, W.-J. Han, G. J.Sullivan, J.-R. Ohm, and T. Wiegand, which is incorporated herein byreference in its entirety, is expected to provide improved compressioncapability over the existing H.264 (also known as AVC) standard,published as, “Advanced Video Coding for generic audio-visual services,”ITU T Rec. H.264 and ISO/IEC 14496-10, which is incorporated herein inits entirety. As appreciated by the inventors here, it is expected thatover the next few years H.264 will still be the dominant video codingstandard used worldwide for the distribution of digital video. It isfurther appreciated that newer standards, such as HEVC, should allow forbackward compatibility with existing standards.

As used herein, the term “coding standard” denotes compression (coding)and decompression (decoding) algorithms that may be both standard-based,open-source, or proprietary, such as the MPEG standards, Windows MediaVideo (WMV), flash video, VP8, and the like.

The approaches described in this section are approaches that could bepursued, but not necessarily approaches that have been previouslyconceived or pursued. Therefore, unless otherwise indicated, it shouldnot be assumed that any of the approaches described in this sectionqualify as prior art merely by virtue of their inclusion in thissection. Similarly, issues identified with respect to one or moreapproaches should not assume to have been recognized in any prior art onthe basis of this section, unless otherwise indicated.

BRIEF DESCRIPTION OF THE DRAWINGS

An embodiment of the present invention is illustrated by way of example,and not in way by limitation, in the figures of the accompanyingdrawings and in which like reference numerals refer to similar elementsand in which:

FIG. 1 depicts an example implementation of a coding system supportingcoding-standard scalability according to an embodiment of thisinvention;

FIG. 2A and FIG. 2B depict example implementations of a coding systemsupporting AVC/H.264 and HEVC codec scalability according to anembodiment of this invention;

FIG. 3 depicts an example of layered coding with a cropping windowaccording to an embodiment of this invention;

FIG. 4 depicts an example of inter-layer processing for interlacedpictures according to an embodiment of this invention;

FIG. 5A and FIG. 5B depict examples of inter-layer processing supportingcoding-standard scalability according to an embodiment of thisinvention;

FIG. 6 depicts an example of RPU processing for signal encoding modelscalability according to an embodiment of this invention;

FIG. 7 depicts an example encoding process according to an embodiment ofthis invention;

FIG. 8 depicts an example decoding process according to an embodiment ofthis invention; and

FIG. 9 depicts an example decoding RPU process according to anembodiment of this invention.

DESCRIPTION OF EXAMPLE EMBODIMENTS

Inter-layer reference picture processing for coding-standard scalabilityis described herein. Given a base layer signal, which is coded by a baselayer (BL) encoder compliant to a first coding standard (e.g., H.264), areference processing unit (RPU) process generates reference pictures andRPU parameters according to the characteristics of input signals in thebase layer and one or more enhancement layers. These inter-layerreference frames may be used by an enhancement layer (EL) encoder whichis compliant to a second coding standard (e.g., HEVC), to compress(encode) one or more enhancement layer signals, and combine them withthe base layer to form a scalable bit stream. In a receiver, afterdecoding a BL stream with a BL decoder which is compliant to the firstcoding standard, a decoder RPU may apply received RPU parameters togenerate inter-layer reference frames from the decoded BL stream. Thesereference frames may be used by an EL decoder which is compliant to thesecond coding standard to decode the coded EL stream.

In the following description, for the purposes of explanation, numerousspecific details are set forth in order to provide a thoroughunderstanding of the present invention. It will be apparent, however,that the present invention may be practiced without these specificdetails. In other instances, well-known structures and devices are notdescribed in exhaustive detail, in order to avoid unnecessarilyobscuring the present invention.

Overview

Example embodiments described herein relate to inter-layer referencepicture processing for coding-standard scalability. In one embodiment,video data are coded in a coding-standard layered bit stream. Given baselayer (BL) and enhancement layer (EL) signals, the BL signal is codedinto a BL stream using a BL encoder which is compliant to a firstencoding standard. In response to the BL signal and the EL signal, areference processing unit (RPU) determines RPU processing parameters. Inresponse to the RPU processing parameters and the BL signal, the RPUgenerates an inter-layer reference signal. Using an EL encoder which iscompliant to a second coding standard, the EL signal is coded into acoded EL stream, where the encoding of the EL signal is based at leastin part on the inter-layer reference signal.

In another embodiment, a receiver demultiplexes a received scalablebitstream to generate a coded BL stream, a coded EL stream, and an RPUdata stream. A BL decoder compliant to a first coding standard decodesthe coded BL stream to generate a decoded BL signal. A receiver with anRPU may also decode the RPU data stream to determine RPU processparameters. In response to the RPU processing parameters and the BLsignal, the RPU may generate an inter-layer reference signal. An ELdecoder compliant to a second coding standard may decode the coded ELstream to generate a decoded EL signal, where the decoding of the codedEL stream is based at least in part on the inter-layer reference signal.

Layered-Based Coding-standard Scalability

Compression standards such as MPEG-2, MPEG-4 (part 2), H.264, flash, andthe like are being used word-wide for delivering digital content througha variety of media, such as, DVD discs or Blu-ray discs, or forbroadcasting over the air, cable, or broadband. As new video codingstandards, such as HEVC, are developed, adoption of the new standardscould be increased if they would support some backward compatibilitywith existing standards.

FIG. 1 depicts an embodiment of an example implementation of a systemsupporting coding-standard scalability. The encoder comprises a baselayer (BL) encoder (110) and an enhancement layer (EL) encoder (120). Inan embodiment, BL Encoder 110 is a legacy encoder, such as an MPEG-2 orH.264 encoder, and EL Encoder 120 is a new standard encoder, such as anHEVC encoder. However, this system is applicable to any combination ofeither known or future encoders, whether they are standard-based orproprietary. The system can also be extended to support more than twocoding standards or algorithms.

According to FIG. 1, an input signal may comprise two or more signals,e.g., a base layer (BL) signal 102 and one or more enhancement layer(EL) signals, e.g. EL 104. Signal BL 102 is compressed (or coded) withBL Encoder 110 to generate a coded BL stream 112. Signal EL 104 iscompressed by EL encoder 120 to generate coded EL stream 122. The twostreams are multiplexed (e.g., by MUX 125) to generate a coded scalablebit stream 127. In a receiver, a demultiplexor (DEMUX 130) may separatethe two coded bit streams. A legacy decoder (e.g., BL Decoder 140) maydecode only the base layer 132 to generate a BL output signal 142.However, a decoder that supports the new encoding method (EL Encoder120), may also decode the additional information provided by the codedEL stream 134 to generate EL output signal 144. BL decoder 140 (e.g., anMPEG-2 or H.264 decoder) corresponds to the BL encoder 110. EL decoder150 (e.g., an HEVC decoder) corresponds to the EL Encoder 120.

Such a scalable system can improve coding efficiency compared to asimulcast system by properly exploring inter-layer prediction, that is,by coding the enhancement layer signal (e.g., 104) by taking intoconsideration information available from the lower layers (e.g., 102).Since the BL Encoder and EL Encoder comply to different codingstandards, in an embodiment, coding standard-scalability may be achievedthrough a separate processing unit, the encoding reference processingunit (RPU) 115.

RPU 115 may be considered an extension of the RPU design described inPCT Application PCT/US2010/040545, “Encoding and decoding architecturefor format compatible 3D video delivery,” by A. Tourapis, et al., filedon Jun. 30, 2010, and published as WO 2011/005624, which is incorporatedherein by reference for all purposes. The following descriptions of theRPU apply, unless otherwise specified to the contrary, both to the RPUof an encoder and to the RPU of a decoder. Artisans of ordinary skill infields that relate to video coding will understand the differences, andwill be capable of distinguishing between encoder-specific,decoder-specific and generic RPU descriptions, functions and processesupon reading of the present disclosure. Within the context of a videocoding system as depicted in FIG. 1, the RPU (115) generates inter-layerreference frames based on decoded images from BL Encoder 110, accordingto a set of rules of selecting different RPU filters and processes.

The RPU 115 enables the processing to be adaptive at a region level,where each region of the picture/sequence is processed according to thecharacteristics of that region. RPU 115 can use horizontal, vertical, ortwo dimensional (2D) filters, edge adaptive or frequency basedregion-dependent filters, and/or pixel replication filters or othermethods or means for interlacing, de-interlacing, filtering,up-sampling, and other image processing.

An encoder may select RPU processes and outputs regional processingsignals, which are provided as input data to a decoder RPU (e.g., 135).The signaling (e.g., 117) may specifies the processing method on aper-region basis. For example, parameters that relate to regionattributes such as the number, size, shape and other characteristics maybe specified in an RPU-data related data header. Some of the filters maycomprise fixed filter coefficients, in which case the filtercoefficients need not be explicitly signaled by the RPU. Otherprocessing modes may comprise explicit modes, in which the processingparameters, such as coefficient values are signaled explicitly. The RPUprocesses may also be specified per each color component.

The RPU data signaling 117 can either be embedded in the encodedbitstream (e.g., 127), or transmitted separately to the decoder. The RPUdata may be signaled along with the layer on which the RPU processing isperformed. Additionally or alternatively, the RPU data of all layers maybe signaled within one RPU data packet, which is embedded in the bitstream either prior to or subsequent to embedding EL encoded data. Theprovision of RPU data may be optional for a given layer. In the eventthat RPU data is not available, a default scheme may thus be used forup-conversion of that layer. Not dissimilarly, the provision of anenhancement layer encoded bit stream is also optional.

An embodiment allows for multiple possible methods of selectingprocessing steps within an RPU. A number of criteria may be usedseparately or in conjunction in determining RPU processing. The RPUselection criteria may include the decoded quality of the base layerbitstream, the decoded quality of the enhancement layer bitstreams, thebit rate required for the encoding of each layer including the RPU data,and/or the complexity of decoding and RPU processing of the data.

The RPU 115 may serve as a pre-processing stage that processesinformation from BL encoder 110, before utilizing this information as apotential predictor for the enhancement layer in EL encoder 120.Information related to the RPU processing may be communicated (e.g., asmetadata) to a decoder as depicted in FIG. 1 using an RPU Layer stream136. RPU processing may comprise a variety of image processingoperations, such as: color space transformations, non-linearquantization, luma and chroma up-sampling, and filtering. In a typicalimplementation, the EL 122, BL 112, and RPU data 117 signals aremultiplexed into a single coded bitstream (127).

Decoder RPU 135 corresponds to the encoder RPU 115, and with guidancefrom RPU data input 136, may assist in the decoding of the EL layer 134by performing operations corresponding to operations performed by theencoder RPU 115.

The embodiment depicted in FIG. 1 can easily be extended to support morethan two layers. Furthermore, it may be extended to support additionalscalability features, including: temporal, spatial, SNR, chroma,bit-depth, and multi-view scalability.

H.264 and HEVC Coding-Standard Scalability

In an example embodiment, FIG. 2A and FIG. 2B depict an exampleembodiment for layer-based coding-standard scalability as it may beapplied to the HEVC and H.264 standards. Without loss of generality,FIG. 2A and FIG. 2B depict only two layers; however, the methods caneasily be extended to systems that support multiple enhancement layers.

As depicted in FIG. 2A, both H.264 encoder 110 and HEVC encoder 120comprise intra prediction, inter prediction, forward transform andquantization (FT), inverse transforms and quantization (IFT), entropycoding (EC), deblocking filters (DF), and Decoded Picture Buffers (DPB).In addition, an HEVC encoder includes also a Sample Adaptive Offset(SAO) block. In an embodiment, as will be explained later on, RPU 115may access BL data either before the deblocking filter (DF) or from theDPB. Similarly, in a multi-standard decoder (see FIG. 2B), decoder RPU135 may also access BL data either before the deblocking filter (DF) orfrom the DPB.

In scalable video coding, the term “multi-loop solution” denotes alayered decoder where pictures in an enhancement layer are decoded basedon reference pictures extracted by both the same layer and othersub-layers. The pictures of the base/reference layers are reconstructedand stored in the Decoded Picture Buffer (DPB). These base layerpictures, called inter-layer reference pictures, can serve as additionalreference pictures, in decoding the enhancement layer. The enhancementlayer then has the options to use either temporal reference pictures orinter-layer reference pictures. In general, inter-layer prediction helpsto improve the EL coding efficiency in a scalable system. Since the AVCand HEVC are two different coding standards and they use differentencoding processes, additional inter-layer processing may be required toguarantee that AVC-coded pictures are considered valid HEVC referencepictures. In an embodiment, such processing may be performed by RPU 115,as it will be explained next for various cases of interest. For codingstandard scalability, the use of RPU 115 aims to resolve the differencesor conflicts arising from using two different standards, both at a highsyntax level and the coding tools level.

Picture Order Count (POC)

HEVC and AVC have several differences at the high level syntax. Inaddition, the same syntax may have a different meaning in each standard.The RPU can work as a high-level syntax “translator” between the baselayer and the enhancement layer. One such example is the syntax relatedto Picture Order Count (POC). In inter-layer prediction, it is importantto synchronize the inter-layer reference pictures from the base layerwith the pictures being encoded in the enhancement layer. Suchsynchronization is even more important when the base layer and theenhancement layers use different picture coding structures. For both theAVC and HEVC standards, the term Picture Order Count (POC) is used toindicate the display order of the coded pictures. However, in AVC, thereare three methods to signal POC information (indicated by the variablepic_order_cnt_type), while in HEVC, only one method is allowed, which isthe same as pic_order_cnt_type==0 in the AVC case. In an embodiment,when pic_order_cnt_type is not equal to 0 in an AVC bitstream, then theRPU (135) will need to translate it into a POC value that conforms tothe HEVC syntax. In an embodiment, an encoder RPU (115) may signaladditional POC-related data by using a new pic_order_cnt_lsb variable,as shown in Table 1. In another embodiment, the encoder RPU may simplyforce the base layer AVC encoder to only use pic_order_cnt_type==0.

TABLE 1 POC syntax Descriptor POC( ) { pic_order_cnt_lsb u(1) }

In Table 1, pic_order_cnt_lsb specifies the picture order count moduloMaxPicOrderCntLsb for the current inter-layer reference picture. Thelength of the pic_order_cnt_lsb syntax element islog2_max_pic_order_cnt_lsb_minus4+4 bits. The value of thepic_order_cnt_lsb shall be in the range of 0 to MaxPicOrderCntLsb−1,inclusive. When pic_order_cnt_lsb is not present, pic_order_cnt_lsb isinferred to be equal to 0.

Cropping Window

In AVC coding, the picture resolution must be a multiple of 16. In HEVC,the resolution can be a multiple of 8. When processing an inter-layerreference picture in the RPU, a cropping window might be used to get ridof padded pixels in AVC. If the base layer and the enhancement layerhave different spatial resolution (e.g., a base layer is 1920×1080 andthe enhancement layer is 4K), or if the picture aspect ratios (PAR) aredifferent (say, 16:9 PAR for the enhancement layer and 4:3 PAR for thebase layer), the image has to be cropped and may be resized accordingly.An example of cropping window related RPU syntax is shown in Table 2.

TABLE 2 Picture Cropping Syntax Descriptor pic_cropping( ) {pic_cropping_flag u(1) if( pic_cropping_flag ) { pic_crop_left_offsetue(v) pic_crop_right_offset ue(v) pic_crop_top_offset ue(v)pic_crop_bottom_offset ue(v) } }

In Table 2, pic_cropping_flag equal to 1 indicates that the picturecropping offset parameters follow next. If pic_cropping_flag=0, then thepicture cropping offset parameters are not present and no cropping isrequired.

pic_crop_left_offset, pic_crop_right_offset, pic_crop_top_offset, andpic_crop_bottom_offset specify the number of samples in the pictures ofthe coded video sequence that are input to the RPU decoding process, interms of a rectangular region specified in picture coordinates for RPUinput.

Note that since the RPU process is performed for each inter-layerreference, the cropping window parameters can change on a frame-by-framebasis. Adaptive region-of-interest based video retargeting is thussupported using the pan-(zoom)-scan approach.

FIG. 3 depicts an example of layered coding, where an HD (e.g.,1920×1080) base layer is coded using H.264 and provides a picture thatcan be decoded by all legacy HD decoders. A lower-resolution (e.g.,640×480) enhancement layer may be used to provide optional support for a“zoom” feature. The EL layer has a smaller resolution than the BL, butmay be encoded in HEVC to reduce the overall bit rate. Inter-layercoding, as described herein, may further improve the coding efficiencyof this EL layer.

In-Loop Deblocking Filter

Both AVC and HEVC employ a deblocking filter (DF) in the coding anddecoding processes. The deblocking filter is intended to reduce theblocking artifacts due to the block based coding. But their designs ineach standard are quite different. In AVC, the deblocking filter isapplied on a 4×4 sample grid basis, but in HEVC, the deblocking filteris only applied to the edges which are aligned on an 8×8 sample grid. InHEVC, the strength of the deblocking filter is controlled by the valuesof several syntax elements similar to AVC, but AVC supports fivestrengths while HEVC supports only three strengths. In HEVC, there areless cases of filtering compared to AVC. For example, for luma, one ofthree cases is chosen: no filtering, strong filtering and weakfiltering. For chroma, there are only two cases: no filtering and normalfiltering. To align the deblocking filter operations between the baselayer reference picture and a temporal reference picture from theenhancement layer, several approaches can be applied.

In one embodiment, the reference picture without AVC deblocking may beaccessed directly by the RPU, with no further post-processing. Inanother embodiment, the RPU may apply the HEVC deblocking filter to theinter-layer reference picture. The filter decision in HEVC is based onthe value of several syntax elements, such as transform coefficients,reference index, and motion vectors. It can be really complicated if theRPU needs to analyze all the information to make a filter decision.Instead, one can explicitly signal the filter index on a 8×8 blocklevel, CU (Coding Unit) level, LCU/CTU (Largest Coding Unit or CodedTree Unit) level, multiple of LCU level, slice level or picture level.One can signal luma and chroma filter indexes separately or they canshare the same syntax. Table 3 shows an example of how the deblockingfilter decision could be indicated as part of an RPU data stream.

TABLE 3 Deblocking filter syntax Descriptor deblocking(rx, ry ) {filter_idx ue(v) }

In Table 3, filter_idx specifies the filter index for luma and chromacomponents. For luma, filter_idx equal to 0 specifies no filtering.filter_idx equal to 1 specifies weak filtering, and filter_idx equal to2 specifies strong filtering. For chroma, filter_idx equal to 0 or 1specifies no filtering, and filter_idx equal to 2 specifies normalfiltering.

Sample Adaptive Offset (SAO)

SAO is a process which modifies, through a look-up table, the samplesafter the deblocking filter (DF). As depicted in FIG. 2A and FIG. 2B, itis only part of the HEVC standard. The goal of SAO is to betterreconstruct the original signal amplitudes by using a look-up table thatis described by a few additional parameters that can be determined byhistogram analysis at the encoder side. In one embodiment, the RPU canprocess the deblocking/non-deblocking inter-layer reference picture fromthe AVC base layer using the exact SAO process as described in HEVC. Thesignaling can be region based, adapted by CTU (LCU) level, multiple ofLCU levels, a slice level, or a picture level. Table 4 shows an examplesyntax for communicating SAO parameters. In Table 4, the notation syntaxis the same as the one described in the HEVC specification.

TABLE 4 Sample Adaptive Offset Syntax Descriptor sao( rx, ry ){  if(rx > 0 ) { sao_merge_left_flag ue(v)  }  if( ry > 0 &&!sao_merge_left_flag ) {  sao_merge_up_flag ue(v)  }  if(!sao_merge_up_flag && !sao_merge_left_flag ) { for( cIdx = 0; cIdx < 3;cIdx++ ) {  if( ( slice_sao_luma_flag && cIdx = = 0 ) | | (slice_sao_chroma_flag && cIdx > 0 ) ) { if( cIdx = = 0 ) sao_type_idx_luma ue(v) if( cIdx = = 1 )  sao_type_idx_chroma ue(v) if(SaoTypeIdx[ cIdx ][ rx ][ ry ] != 0 ) {  for( i = 0; i < 4; i++ )sao_offset_abs[ cIdx ][ rx][ ry ][ i ] ue(v)  if( SaoTypeIdx[ cIdx ][ rx][ ry ] = = 1 ) { for( i = 0; i < 4; i++ ) {  if( sao_offset_abs[ cIdx][ rx ][ ry ][ i ] !=  0 ) sao_offset_sign[ cIdx ][ rx ][ ry ][ i ]ae(v) sao_band_position[ cIdx ][ rx ][ ry ] ae(v)  } else { if( cIdx = =0 )  sao_eo_class_luma ae(v) if( cIdx = = 1 )  sao_eo_class_chroma ae(v) } }  } }  } }

Adaptive Loop Filter (ALF)

During the development of HEVC, an adaptive loop filter (ALF) was alsoevaluated as a processing block following SAO; however, ALF is not partof the first version of HEVC. Since ALF processing can improveinter-layer coding, if implemented by a future encoder, it is anotherprocessing step that could be implemented by the RPU as well. Theadaptation of ALF can be region based, adapted by a CTU (LCU) level,multiple of LCU levels, a slice level, or a picture level. An example ofALF parameters is described by alf_picture_info( ) in, “High efficiencyvideo coding (HEVC) text specification draft 7,” by B. Bross, W.-J. Han,G. J. Sullivan, J.-R. Ohm, and T. Wiegand, ITU-T/ISO/IEC JointCollaborative Team on Video Coding (JCT-VC) document JCTVC-I1003, May2012, which is incorporated herein by reference in its entirety.

Interlaced and Progressive Scanning

AVC supports coding tools for both progressive and interlaced content.For interlaced sequences, it allows both frame coding and field coding.In HEVC, no explicit coding tools are present to support the use ofinterlaced scanning. HEVC provides only metadata syntax (FieldIndication SEI message syntax and VUI) to allow an encoder to indicatehow interlaced content was coded. The following scenarios areconsidered.

Scenario 1: Both the Base Layer and the Enhancement Layer are Interlaced

For this scenario, several methods can be considered. In a firstembodiment, the encoder may be constrained to change the base layerencoding in a frame or field mode only on a per sequence basis. Theenhancement layer will follow the coding decision from the base layer.That is, if the AVC base layer uses field coding in one sequence, theHEVC enhancement layer will use field coding in the correspondingsequence too. Similarly, if the AVC base layer uses frame coding in onesequence, the HEVC enhancement layer will use frame coding in thecorresponding sequence too. It is noted that for field coding, thevertical resolution signaled in the AVC syntax is the frame height;however, in HEVC, the vertical resolution signaled in the syntax is thefield height. Special care must be taken in communicating thisinformation in the bit stream, especially if a cropping window is used.

In another embodiment, the AVC encoder may use picture-level adaptiveframe or field coding, while the HEVC encoder performs sequence-leveladaptive frame or field coding. In both cases, the RPU can processinter-layer reference pictures in one of the following ways: a) The RPUmay process the inter-layer reference picture as fields, regardless ofthe frame or field coding decision in the AVC base layer, or b) the RPUmay adapt the processing of the inter-layer reference pictures based onthe frame/field coding decision in the AVC base layer. That is, if theAVC base layer is frame-coded, the RPU will process the inter-layerreference picture as a frame, otherwise, it will process the inter-layerreference picture as fields.

FIG. 4 depicts an example of Scenario 1. The notation Di or Dp denotesframe rate and whether the format is interlaced or progressive. Thus, Didenotes D interlaced frames per second (or 2D fields per second) and Dpdenotes D progressive frames per second. In this example, the base layercomprises a standard-definition (SD) 720×480, 30i sequence coded usingAVC. The enhancement layer is a high-definition (HD) 1920×1080, 60isequence, coded using HEVC. This example incorporates codec scalability,temporal scalability, and spatial scalability. Temporal scalability ishandled by the enhancement layer HEVC decoder using a hierarchicalstructure with temporal prediction only (this mode is supported by HEVCin a single-layer). Spatial scalability is handled by the RPU, whichadjusts and synchronizes slices of the inter-layer reference field/framewith it is corresponding field/frame slices in the enhancement layer.

Scenario 2: The Base Layer is Interlaced and the Enhancement Layer isProgressive

In this scenario, the AVC base layer is an interlaced sequence and theHEVC enhancement layer is a progressive sequence. FIG. 5A depicts anexample embodiment wherein an input 4K 120p signal (502) is encoded asthree layers: a 1080 30i BL stream (532), a first enhancement layer(EL0) stream (537), coded as 1080 60p, and a second enhancement layerstream (EL1) (517), coded as 4K 120p. The BL and EL0 signals are codedusing an H.264/AVC encoder while the EL1 signal may be coded using HEVC.On the encoder, starting with a high-resolution, high-frame 4K, 120psignal (502), the encoder applies temporal and spatial down-sampling(510) to generate a progressive 1080 60p signal 512. Using acomplementary progressive to deinterlacing technique (520), the encodermay also generate two complimentary, 1080 30i, interlaced signals BL522-1 and EL0 522-2. As used herein, the term “complementary progressiveto deinterlacing technique” denotes a scheme that generates twointerlaced signals from the same progressive input, where bothinterlaced signals have the same resolution, but one interlaced signalincludes the fields from the progressive signal that are not part of thesecond interlaced signal. For example, if the input signal at timeT_(i), i=0, 1, . . . , n, is divided into top and bottom interlacedfields (Top-T_(i), Bottom-T_(i)), then the first interlaced signal maybe constructed using (Top-T₀, Bottom-T_(i)), (Top-T₂, Bottom-T₃), etc.,while the second interlaced signal may be constructed using theremaining fields, that is: (Top-T₁, Bottom-T₀), (Top-T₃, Bottom-T₂),etc.

In this example, the BL signal 522-1 is a backward-compatible interlacedsignal that can be decoded by legacy decoders, while the EL0 signal522-2 represents the complimentary samples from the original progressivesignal. For the final picture composition of the full frame-rate, everyreconstructed field picture from the BL signal must be combined with afield picture within the same access unit but with the opposite fieldparity. Encoder 530 may be an AVC encoder that comprises two AVCencoders (530-1 and 530-2) and RPU processor 530-3. Encoder 530 may useinterlayer processing to compress signal EL0 using reference frames fromboth the BL and the EL0 signals. RPU 530-3 may be used to prepare the BLreference frames used by the 530-2 encoder. It may also be used tocreate progressive signal 537, to be used for the coding of the EL1signal 502 by EL1 encoder 515.

In an embodiment, an up-sampling process in the RPU (535) is used toconvert the 1080 60p output (537) from RPU 530-3 into a 4K 60p signal tobe used by HEVC encoder 515 during inter-layer prediction. EL1 signal502 may be encoded using temporal and spatial scalability to generate acompressed 4K 120p stream 517. Decoders can apply a similar process toeither decode a 1080 30i signal, a 1080 60p signal, or a 4K 120p signal.

FIG. 5B depicts another example implementation of aninterlaced/progressive system according to an embodiment. This is a twolayer system, where a 1080 30i base layer signal (522) is encoded usingan AVC encoder (540) to generate a coded BL stream 542, and a 4K 120penhancement layer signal (502) is encoded using an HEVC encoder (515) togenerate a coded EL stream 552. These two streams may be multiplexed toform a coded scalable bit stream 572.

As depicted in FIG. 5B, RPU 560 may comprise two processes: ade-interlacing process, which converts BL 522 to a 1080 60p signal, andan up-sampling process to convert the 1080 60p signal back to a 4K 60psignal, so the output of the RPU may be used as a reference signalduring inter-layer prediction in encoder 515.

Scenario 3: The Base Layer is Progressive and the Enhancement Layer isInterlaced

In this scenario, in one embodiment, the RPU may convert the progressiveinter-layer reference picture into an interlaced picture. Theseinterlaced pictures can be processed by the RPU as a) always fields,regardless of whether the HEVC encoder uses sequence-based frame orfield coding, or as b) fields or frames, depending on the mode used bythe HEVC encoder. Table 5 depicts an example syntax that can be used toguide the decoder RPU about the encoder process.

TABLE 5 Interlace Processing Syntax Descriptor interlace_process( ) {base_field_seq_flag u(1) enh_field_seq_flag u(1) }

In Table 5, base_field_seq_flag equal to 1 indicates that the base layercoded video sequence conveys pictures that represent fields.base_field_seq_flag equal to 0 indicates that the base layer coded videosequence conveys pictures that represent frames.

enh_field_seq_flag equal to 1 indicates that the enhancement layer codedvideo sequence conveys pictures that represent fields.enh_field_seq_flag equal to 0 indicates that the enhancement layer codedvideo sequence conveys pictures that represent frames.

Table 6 shows how an RPU may process the reference pictures based on thebase_field_seq_flag or enh_field_seq_flag flags.

TABLE 6 RPU processing for progressive/interlaced scanning sequencesbase_field_seq_flag enh_field_seq_flag RPU processing 1 1 field 1 0De-interlacing + frame 0 1 Interlacing + field 0 0 frame

Signal Encoding Model Scalability

Gamma-encoding is arguably the most widely used signal encoding model,due to its efficiency for representing standard dynamic range (SDR)images. In recent research for high-dynamic range (HDR) imaging, it wasfound that for several types of images, other signal encoding models,such as the Perceptual Quantizer (PQ) described in “Parameter values forUHDTV”, a submission to SG6 WP 6C, WP6C/USA002, by Craig Todd, or U.S.Provisional patent application with Ser. No. 61/674,503, filed on Jul.23, 2012, and titled “Perceptual luminance nonlinearity-based image dataexchange across different display capabilities,” by Jon S. Miller etal., both incorporated herein by reference in their entirety, couldrepresent the data more efficiently. Therefore, it is possible that ascalable system may have one layer of SDR content which is gamma-coded,and another layer of high dynamic range content which is coded usingother signal encoding models.

FIG. 6 depicts an embodiment where RPU 610 (e.g., RPU 115 in FIG. 1) maybe set to adjust the signal quantizer of the base layer. Given a BLsignal 102 (e.g., 8-bit, SDR video signal, gamma encoded in 4:2:0 Rec.709), and an EL signal 104 (e.g., 12-bit HDR video signal, PQ encoded in4:4:4 in P3 color space), processing in RPU 610 may comprise: gammadecoding, other inverse mappings (e.g., color space conversions,bit-depth conversions, chroma sampling, and the like), and SDR to HDRperceptual quantization (PQ). The signal decoding and encoding method(e.g., gamma and PQ), and related parameters, may be part of metadatathat are transmitted together with the coded bitstream or they can bepart of a future HEVC syntax. Such RPU processing may be combined withother RPU processing related to other types of scalabilities, such asbit-depth, chroma format, and color space scalability. As depicted inFIG. 1, similar RPU processing may also be performed by a decoder RPUduring the decoding of the scalable bit stream 127.

Scalability extension can include several other categories, such as:spatial or SNR scalability, temporal scalability, bit-depth scalability,and chroma resolution scalability. Hence, an RPU can be configured toprocess inter-layer reference pictures under a variety of codingscenarios. For better encoder-decoder compatibility, encoders mayincorporate special RPU-related bit stream syntax to guide thecorresponding RPU decoder. The syntax can be updated at a variety ofcoding levels, including: the slice level, the picture level, the GOPlevel, the scene level, or at the sequence level. It also can beincluded in a variety of auxiliary data, such as: the NAL unit header,Sequence Parameter Set (SPS) and its extension, SubSPS, PictureParameter Set (PPS), slicer header, SEI message, or a new NAL unitheader. Since there may be a lot of RPU-related processing tools, formaximum flexibility and ease of implementation, in one embodiment, wepropose to reserve a new NAL unit type for the RPU to make it a separatebitstream. Under such an implementation, a separate RPU module is addedto the encoder and decoder modules to interact with the base layer andthe one or more enhancement layers. Table 7 shows an example of RPU datasyntax which includes rpu_header_data( ) (shown in Table 8) andrpu_payload_data( ) (shown in Table 9), in a new NAL unit. In thisexample, multiple partitions are enabled to allow region baseddeblocking and SAO decisions.

TABLE 7 RPU data syntax Descriptor rpu_data ( ) { rpu_header_data( )rpu_payload_data( ) rbsp_trailing_bits( ) }

TABLE 8 RPU header data syntax Descriptor rpu_header_data ( ) { rpu_typeu(6) POC( ) pic_cropping( ) deblocking_present_flag u(1)sao_present_flag u(1) alf_present_flag u(1) if (alf_present_flag)alf_picture_info( ) interlace_process( ) num_x_partitions_minus1 ue(v)num_y_partitions_minus1 ue(v) }

TABLE 9 RPU payload data syntax Descriptor rpu_payload_data ( ) {  for(y = 0, y <= num_y_partitions_minus1; y++ ) { for (x = 0; x <=num_x_partitions_minus1; x++ ) {   if (deblocking_present_flag)deblocking( )   if (sao_present_flag) sao( )   /* below is to add otherparameters related to upsampling filter, mapping, etc.*/  /* example 1:if (rpu_type ==  SPATIAL_SCALABILITY) */  /*rpu_process_spatial_scalability( ) */  /* example 2: if (rpu_type == BIT_DEPTH_SCALABILITY) */  /* rpu_process_bit_depth_scalability( ) */  .... }  } }

In Table 8, rpu_type specifies the prediction type purpose for the RPUsignal. It can be used to specify different kinds of scalability. Forexample, rpu_type equal to 0 may specify spatial scalability, andrpu_type equal to 1 may specifies bit-depth scalability. In order tocombine different scalability modes, one may also use a maskingvariable, such as rpu_mask. For example, rpu_mask=0x01 (binary 00000001)may denote that only spatial scalability is enabled. rpu_mask=0x02(binary 00000010) may denote that only bit-depth scalability is enabled.rpu_mask=0x03 (binary 00000011) may denote that both spatial andbit-depth scalability are enabled.

deblocking_present_flag equal to 1 indicates syntax related todeblocking filter is present in the RPU data.

sao_present_flag equal to 1 indicates syntax related to SAO is presentin the RPU data.

alf_present_flag equal to 1 indicates syntax related to ALF filter ispresent in the RPU data.

num_x_partitions_minusl signals the number of partitions that are usedto subdivide the processed picture in the horizontal dimension in RPU.

num_y_partitions_minusl signals the number of partitions that are usedto subdivide the processed picture in the vertical dimension in RPU.

In another embodiment, instead of using POC to synchronize the baselayer and enhancement layer pictures, the RPU syntax is signaled at thepicture level, so multiple pictures can reuse the same RPU syntax, whichresult in lower bit overhead and possibly reducing processing overheadin some implementations. Under this implementation, the rpu_id will beadded into the RPU syntax. In slice_header( ), it will always refer torpu_id to synchronize RPU syntax with the current slice, where therpu_id variable identifies the rpu_data( ) that is referred to in theslice header.

FIG. 7 depicts an example encoding process according to an embodiment.Given a series of pictures (or frames), the encoder encodes a base layerwith a BL encoder using a first compression standard (e.g., AVC) (715).Next (720, 725), as depicted in FIGS. 2A and 2B, RPU process 115, mayaccess base layer pictures either before or after the deblocking filter(DF) The decision can be made based on RD (rate-distortion) optimizationor the processing that RPU performs. For example, if RPU performsup-sampling, which may also be used in deblocking the block boundaries,then the RPU may just use the decoded base layer before the deblockingfilter, and the up-sampling process may retain more details. RPU 115 maydetermine the RPU processing parameters based on the BL and EL codingparameters. If needed, the RPU process may also access data from the ELinput. Then, in step 730, the RPU processes the inter-layer referencepictures according to the determined RPU process parameters. Thegenerated inter-layer pictures (735) may now be used by the EL encoderusing a second compression standard (e.g., an HEVC encoder) to compressthe enhancement layer signal.

FIG. 8 depicts an example decoding process according to an embodiment.First (810), the decoder parses the high-level syntax of the inputbitstream to extract sequence parameters and RPU-related information.Next (820), it decodes the base layer with a BL decoder according to thefirst compression standard (e.g., an AVC decoder). After decoding theRPU-process related parameters (825), the RPU process generatesinter-layer reference pictures according to these parameters (steps 830and 835). Finally, the decoder decodes the enhancement layer using an ELdecoder that complies with the second compression standard (e.g., anHEVC decoder) (840).

Given the example RPU parameters defined in Tables 1-9, FIG. 9 depictsan example decoding RPU process according to an embodiment. First (910),the decoder extracts from the bitstream syntax the high-levelRPU-related data, such as RPU type (e.g., rpu_type in Table 8), POC( ),and pic_cropping( ). The term “RPU type” refers to RPU-relatedsub-processes that need to be considered, such as: coding-standardscalability, spatial scalability, bit-depth scalability, and the like,as discussed earlier. Given a BL frame, cropping, and ALF-relatedoperations may be processed first (e.g., 915, 925). Next, afterextracting the required interlaced or deinterlaced mode (930), for eachpartition, the RPU performs deblocking and SAO-related operations (e.g.,935, 940). If additional RPU processing needs to be performed (945),then the RPU decodes the appropriate parameters (950) and then performsoperations according to these parameters. At the end of this process, asequence of inter-layer frames is available to the EL decoder to decodethe EL stream.

Example Computer System Implementation

Embodiments of the present invention may be implemented with a computersystem, systems configured in electronic circuitry and components, anintegrated circuit (IC) device such as a microcontroller, a fieldprogrammable gate array (FPGA), or another configurable or programmablelogic device (PLD), a discrete time or digital signal processor (DSP),an application specific IC (ASIC), and/or apparatus that includes one ormore of such systems, devices or components. The computer and/or IC mayperform, control or execute instructions relating to RPU processing,such as those described herein. The computer and/or IC may compute anyof a variety of parameters or values that relate to RPU processing asdescribed herein. The RPU-related embodiments may be implemented inhardware, software, firmware and various combinations thereof.

Certain implementations of the invention comprise computer processorswhich execute software instructions which cause the processors toperform a method of the invention. For example, one or more processorsin a display, an encoder, a set top box, a transcoder or the like mayimplement methods RPU processing as described above by executingsoftware instructions in a program memory accessible to the processors.The invention may also be provided in the form of a program product. Theprogram product may comprise any medium which carries a set ofcomputer-readable signals comprising instructions which, when executedby a data processor, cause the data processor to execute a method of theinvention. Program products according to the invention may be in any ofa wide variety of forms. The program product may comprise, for example,physical media such as magnetic data storage media including floppydiskettes, hard disk drives, optical data storage media including CDROMs, DVDs, electronic data storage media including ROMs, flash RAM, orthe like. The computer-readable signals on the program product mayoptionally be compressed or encrypted.

Where a component (e.g. a software module, processor, assembly, device,circuit, etc.) is referred to above, unless otherwise indicated,reference to that component (including a reference to a “means”) shouldbe interpreted as including as equivalents of that component anycomponent which performs the function of the described component (e.g.,that is functionally equivalent), including components which are notstructurally equivalent to the disclosed structure which performs thefunction in the illustrated example embodiments of the invention.

Equivalents, Extensions, Alternatives and Miscellaneous

Example embodiments that relate to RPU processing and standards-basedcodec scalability are thus described. In the foregoing specification,embodiments of the present invention have been described with referenceto numerous specific details that may vary from implementation toimplementation. Thus, the sole and exclusive indicator of what is theinvention, and is intended by the applicants to be the invention, is theset as recited in claims that issue from this application, in thespecific form in which such claims issue, including any subsequentcorrection. Any definitions expressly set forth herein for termscontained in such claims shall govern the meaning of such terms as usedin the claims. Hence, no limitation, element, property, feature,advantage or attribute that is not expressly recited in a claim shouldlimit the scope of such claim in any way. The specification and drawingsare, accordingly, to be regarded in an illustrative rather than arestrictive sense.

1-18. (canceled)
 19. A method for decoding a video stream by a decoder,the method comprising: accessing a base layer picture; receiving apicture cropping flag in the video stream indicating that offsetcropping parameters are present; and in response to receiving thepicture cropping flag indicating that the offset cropping parameters arepresent: accessing the offset cropping parameters; cropping one or moreregions of the base layer picture according to the accessed offsetcropping parameters to generate a cropped reference picture; andgenerating a reference picture for an enhancement layer according to thecropped reference picture.
 20. The method of claim 19, wherein the baselayer picture is in a first spatial resolution, and wherein generatingthe reference picture comprises scaling the cropped reference picturefrom the first spatial resolution to a second spatial resolution suchthat the reference picture for the enhancement layer is in the secondspatial resolution.
 21. The method of claim 19, wherein the offsetcropping parameters are updated on a frame-by-frame basis in the videostream.
 22. The method of claim 19, further comprising detecting thatthe picture cropping flag is set to a predetermined value.
 23. Themethod of claim 22, wherein the predetermine value is
 1. 24. The methodof claim 19, wherein the offset cropping parameters comprise a leftoffset, a right offset, a top offset, and a bottom offset.
 25. A decoderfor decoding a video stream, comprising: one or more processorsconfigured to: access a base layer picture; receive a picture croppingflag in the video stream indicating that offset cropping parameters arepresent; and in response to receiving the picture cropping flagindicating that the offset cropping parameters are present: access theoffset cropping parameters; crop one or more regions of the base layerpicture according to the accessed offset cropping parameters to generatea cropped reference picture; and generate a reference picture for anenhancement layer according to the cropped reference picture.
 26. Thedecoder of claim 25, wherein the base layer picture is in a firstspatial resolution, and wherein generating the reference picturecomprises scaling the cropped reference picture from the first spatialresolution to a second spatial resolution such that the referencepicture for the enhancement layer is in the second spatial resolution.27. The decoder of claim 25, wherein the offset cropping parameters areupdated on a frame-by-frame basis in the video stream.
 28. The decoderof claim 25, further comprising detecting that the picture cropping flagis set to a predetermined value.
 29. The decoder of claim 28, whereinthe predetermine value is
 1. 30. The decoder of claim 25, wherein theoffset cropping parameters comprise a left offset, a right offset, a topoffset, and a bottom offset.
 31. A computer-readable medium coupled toone or more processors having instructions stored thereon which, whenexecuted by the one or more processors, cause the one or more processorsto perform operations comprising: accessing a base layer picture;receiving a picture cropping flag in a video stream indicating thatoffset cropping parameters are present; and in response to receiving thepicture cropping flag indicating that the offset cropping parameters arepresent: accessing the offset cropping parameters; cropping one or moreregions of the base layer picture according to the accessed offsetcropping parameters to generate a cropped reference picture; andgenerating a reference picture for an enhancement layer according to thecropped reference picture.
 32. The computer-readable medium of claim 31,wherein the base layer picture is in a first spatial resolution, andwherein generating the reference picture comprises scaling the croppedreference picture from the first spatial resolution to a second spatialresolution such that the reference picture for the enhancement layer isin the second spatial resolution.
 33. The computer-readable medium ofclaim 31, wherein the offset cropping parameters are updated on aframe-by-frame basis in the video stream.
 34. The computer-readablemedium of claim 31, further comprising detecting that the picturecropping flag is set to a predetermined value.
 35. The computer-readablemedium of claim 34, wherein the predetermine value is
 1. 36. Thecomputer-readable medium of claim 31, wherein the offset croppingparameters comprise a left offset, a right offset, a top offset, and abottom offset.